-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Impeller] Improve performance of CupertinoPicker with opacity peephole #40101
Conversation
TBD: whether we should do it like this, testing, et cetera |
// command wrapped in save layer. This would indicate something like an | ||
// Opacity or FadeTransition wrapping a very simple widget, like in the | ||
// CupertinoPicker. | ||
if (entity_pass->GetEntityCount() > 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern I have with working on this optimization now: subpasses are extra expensive right now on the metal backend due to the whole waitForScheduled, and also potentially due to lack of multithreaded encoding.
We might tune these thresholds too high and end up trading a bunch of raster time for decreasing returns of GPU utilization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! We're terribly underutilizing the GPU here. Sibling EntityPasses are fundamentally independent and can be evaluated in parallel on the CPU (not a huge impact in this case) and executed in parallel on the GPU (absolutely enormous impact in this case). Once we have a proper synchronization solution, we can remove all of the host<->device syncs that are blocking the CPU and saturate the GPU with parallelizable work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I think we added texture atlas sharing between EntityPasses at some point. That (and any other resource sharing between EntityPasses) needs to be reverted, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get around that by making glyph atlas construction happen eagerly? Otherwise, glyph atlas construction and upload takes a substantial amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, more specifically, can we push construction of the glyph atlas including all text frames into a different thread and then use whatever resource tracking mechanism we create to ensure the subsequent commands that depend on this are ordered correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get around that by making glyph atlas construction happen eagerly
Yeah I think that's probably the best option -- just render the whole thing at once and upload everything just before the EntityPass run. Then the uploaded atlases become read only for the frame and don't restrict parallelism.
impeller/entity/entity_pass.h
Outdated
@@ -52,6 +52,13 @@ class EntityPass { | |||
|
|||
void IterateAllEntities(const std::function<bool(Entity&)>& iterator); | |||
|
|||
/// @brief Iterates all entities on this pass, returning if there is a | |||
/// subpass. | |||
bool IterateAllFlatEntities(const std::function<bool(Entity&)>& iterator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of an unnecessary restriction, but I want to avoid traversing the entire tree. Ideally if all children accepted opacity and there was a subpass, we could check if the subpass accepts opacity and just push it one layer down.
This would help cases where we've nested FadeTransitions/Opacity
/// a way that makes accepting opacity impossible. It is always safe | ||
/// to return false, especially if computing overlap would be | ||
/// computationally expensive. | ||
virtual bool CanAcceptOpacity(const Entity& entity) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be a single call. Say "virtual bool InheritOpacity(...)". The return value can indicate if the operation succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to make sure the entities in a pass aren't overlapping too, so we can't combine them. basically there would need to be two levels of checks:
- For each entity in the pass, it needs to be able to accept opacity and not overlap other entities.
- Within an entities contents, its geometry needs to be non-overlapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
From investigation, its not that Skia has better coverage info, its that the display list heuristics don't consider overlap: flutter/flutter#122266 |
I think this is ready for review. It works for the cupertino picker case, and we can continue to invest in it should we find additional cases where we create way too many subpasses |
well, rebasing from github is hard! |
new_layer_pass.SetDelegate( | ||
std::make_unique<PaintPassDelegate>(paint, bounds)); | ||
|
||
// Only apply opacity peephole on default blending. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this deserve a TODO/bug to extend this to other blend modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can in general.
impeller/aiks/paint_pass_delegate.cc
Outdated
} | ||
bool all_can_accept = true; | ||
std::vector<Rect> all_coverages; | ||
auto had_subpass = entity_pass->IterateAllFlatEntities([&](Entity& entity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid over-capturing (don't use [&]
). Instead explicitly capture what you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
impeller/aiks/paint_pass_delegate.cc
Outdated
auto maybe_coverage = contents->GetCoverage(entity); | ||
if (maybe_coverage.has_value()) { | ||
auto coverage = maybe_coverage.value(); | ||
for (const auto& cv : all_coverages) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this even simpler by just bailing out if we don't get the structure in the comments above (i.e. clip, draw, restore)?
A small change above could lead to a very expensive change down here without realizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a good way to check what kind of entity something is.
@@ -4,6 +4,7 @@ | |||
|
|||
#include "impeller/entity/contents/text_contents.h" | |||
|
|||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <iostream> |
Not sure if github is showing that right, but what I'm trying to articulate is that this line can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it!
return true; | ||
} | ||
auto glyph_positions = runs_[0].GetGlyphPositions(); | ||
if (glyph_positions.size() > 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
What happens if there are 100 TextFrame
objects each with 10 glyphs? This seems like the wrong level for optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is more than 1 run or more than 10 glyphs, we assume that its not worth computing the overlap - instead just assume the text frame has overlapping contents and don't apply the optimization. Seems reasonable to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would we bail on 1 run that's 11 glyphs long, but chug along merrily on 1000 runs that are 10 glyphs long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L51, we don't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can't get multiple TextFrame
s in a single pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, we need to do these checks because text rendering behaves differently than solid color fills. The optimization has a limitation in the number of entities we check per frame which is now documented on the PaintPassDelegate
This also helps two places in the gallery:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
impeller/entity/entity_pass.h
Outdated
@@ -52,6 +52,13 @@ class EntityPass { | |||
|
|||
void IterateAllEntities(const std::function<bool(Entity&)>& iterator); | |||
|
|||
/// @brief Iterates all entities on this pass, returning if there is a | |||
/// subpass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "Iterate entities in this pass up until the first subpass is found. This is useful for limiting look-ahead optimizations."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
impeller/entity/entity_pass.h
Outdated
@@ -52,6 +52,13 @@ class EntityPass { | |||
|
|||
void IterateAllEntities(const std::function<bool(Entity&)>& iterator); | |||
|
|||
/// @brief Iterates all entities on this pass, returning if there is a | |||
/// subpass. | |||
bool IterateAllFlatEntities(const std::function<bool(Entity&)>& iterator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, but maybe rename to "IterateUntilSubpass"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
(My concerns have been addressed) |
…122746) * 3c3cbc738 Manual roll Dart SDK from c766fffb626e to 3b109a04f783 (9 revisions) (flutter/engine#40296) * dee0ae590 [Impeller] Make matrix image filters work as expected with nested saving layers (flutter/engine#40299) * 835759f4c Roll Skia from 4d90ba479527 to 3b9131c65c01 (5 revisions) (flutter/engine#40304) * bbde3a77b Roll Fuchsia Linux SDK from BRE9jdqYpdkbU0j7H... to YaWqKKuj-fAqfpKCm... (flutter/engine#40306) * 1a7e7c468 Roll Dart SDK from 3b109a04f783 to 5c210933cdfe (2 revisions) (flutter/engine#40307) * 66a3324cf Analyze more shaders (flutter/engine#40285) * 405c8513e [Impeller] Improve performance of CupertinoPicker with opacity peephole (flutter/engine#40101) * 9fc3246bf Reland "Make FlutterTest the default test font" (#40188) (flutter/engine#40245) * 3ed9f1236 Reland: "Added wide-gamut color support for `ui.Image.toByteData` and `ui.Image.colorSpace`" (flutter/engine#40312) * e143b309c Bump lower Dart SDK constraints to 3.0 (flutter/engine#40178) * 086b1a022 [impeller] implement GetPositionUVBuffer (flutter/engine#40248) * ec151bf2c Revert Dart SDK to c766fffb626e (flutter/engine#40315)
Implements an EntityPassDelegate that applies an opacity peephole optimization. This has been tuned to be extremely conservative, and does not successfully remove all opacity layers from the cupertino example linked. Skia on the other hand does manage to, so perhaps it is using more precise coverage information from the glyphs.
Impeller Before
Impeller After
Skia
flutter/flutter#121476